Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We consider the question of whether thermodynamic macrostates are objective consequences of dynamics, or subjective reflections of our ignorance of a physical system. We argue that they are both; more specifically, that the set of macrostates forms the unique maximal partition of phase space which (1) is consistent with our observations (a subjective fact about our ability to observe the system) and (2) obeys a Markov process (an objective fact about the system’s dynamics). We review the ideas of computational mechanics, an information-theoretic method for finding optimal causal models of stochastic processes, and argue that macrostates coincide with the “causal states” of computational mechanics. Defining a set of macrostates thus consists of an inductive process where we start with a given set of observables, and then refine our partition of phase space until we reach a set of states which predict their own future, i.e. which are Markovian. Macrostates arrived at in this way are provably optimal statistical predictors of the future values of our observables.more » « lessFree, publicly-accessible full text available February 1, 2026
-
In New Mexico and many other jurisdictions, judges may detain defendants pretrial if the prosecutor proves, through clear and convincing evidence, that releasing them would pose a danger to the public. However, some policymakers argue that certain classes of defendants should have a “rebuttable presumption” of dangerousness, shifting the burden of proof to the defense. Using data on over 15,000 felony defendants who were released pretrial in a 4‐year period in New Mexico, we measure how many of them would have been detained by various presumptions, and what fraction of these defendants in fact posed a danger in the sense that they were charged with a new crime during pretrial supervision. We consider presumptions based on the current charge, past convictions, past failures to appear, past violations of conditions of release, and combinations of these drawn from recent legislative proposals. We find that for all these criteria, at most 8% of the defendants they identify are charged pretrial with a new violent crime (felony or misdemeanor), and at most 5% are charged with a new violent felony. The false‐positive rate, that is, the fraction of defendants these policies would detain who are not charged with any new crime pretrial, ranges from 71% to 90%. The broadest legislative proposals, such as detaining all defendants charged with a violent felony, are little more accurate than detaining a random sample of defendants released under the current system, and would jail 20 or more people to prevent a single violent felony. We also consider detention recommendations based on risk scores from the Arnold Public Safety Assessment (PSA). Among released defendants with the highest risk score and the “violence flag,” 7% are charged with a new violent felony and 71% are false positives. We conclude that these criteria for rebuttable presumptions do not accurately target dangerous defendants: they cast wide nets and recommend detention for many pretrial defendants who do not pose a danger to the public.more » « less
-
Fu, Feng (Ed.)Network science has increasingly become central to the field of epidemiology and our ability to respond to infectious disease threats. However, many networks derived from modern datasets are not just large, but dense, with a high ratio of edges to nodes. This includes human mobility networks where most locations have a large number of links to many other locations. Simulating large-scale epidemics requires substantial computational resources and in many cases is practically infeasible. One way to reduce the computational cost of simulating epidemics on these networks is sparsification , where a representative subset of edges is selected based on some measure of their importance. We test several sparsification strategies, ranging from naive thresholding to random sampling of edges, on mobility data from the U.S. Following recent work in computer science, we find that the most accurate approach uses the effective resistances of edges, which prioritizes edges that are the only efficient way to travel between their endpoints. The resulting sparse network preserves many aspects of the behavior of an SIR model, including both global quantities, like the epidemic size, and local details of stochastic events, including the probability each node becomes infected and its distribution of arrival times. This holds even when the sparse network preserves fewer than 10% of the edges of the original network. In addition to its practical utility, this method helps illuminate which links of a weighted, undirected network are most important to disease spread.more » « less
-
Bojanczyk, M. et (Ed.)Embedding graphs in a geographical or latent space, i.e. inferring locations for vertices in Euclidean space or on a smooth manifold or submanifold, is a common task in network analysis, statistical inference, and graph visualization. We consider the classic model of random geometric graphs where n points are scattered uniformly in a square of area n, and two points have an edge between them if and only if their Euclidean distance is less than r. The reconstruction problem then consists of inferring the vertex positions, up to the symmetries of the square, given only the adjacency matrix of the resulting graph. We give an algorithm that, if r = n^α for α > 0, with high probability reconstructs the vertex positions with a maximum error of O(n^β) where β = 1/2-(4/3)α, until α ≥ 3/8 where β = 0 and the error becomes O(√{log n}). This improves over earlier results, which were unable to reconstruct with error less than r. Our method estimates Euclidean distances using a hybrid of graph distances and short-range estimates based on the number of common neighbors. We extend our results to the surface of the sphere in ℝ³ and to hypercubes in any constant dimension.more » « less
-
For the tensor PCA (principal component analysis) problem, we propose a new hierarchy of increasingly powerful algorithms with increasing runtime. Our hierarchy is analogous to the sumof-squares (SOS) hierarchy but is instead inspired by statistical physics and related algorithms such as belief propagation and AMP (approximate message passing). Our level-ℓ algorithm can be thought of as a linearized message-passing algorithm that keeps track of ℓ-wise dependencies among the hidden variables. Specifically, our algorithms are spectral methods based on the Kikuchi Hessian, which generalizes the well-studied Bethe Hessian to the higher-order Kikuchi free energies. It is known that AMP, the flagship algorithm of statistical physics, has substantially worse performance than SOS for tensor PCA. In this work we ‘redeem’ the statistical physics approach by showing that our hierarchy gives a polynomial-time algorithm matching the performance of SOS. Our hierarchy also yields a continuum of subexponential-time algorithms, and we prove that these achieve the same (conjecturally optimal) tradeoff between runtime and statistical power as SOS. Our proofs are much simpler than prior work, and also apply to the related problem of refuting random k-XOR formulas. The results we present here apply to tensor PCA for tensors of all orders, and to k-XOR when k is even. Our methods suggest a new avenue for systematically obtaining optimal algorithms for Bayesian inference problems, and our results constitute a step toward unifying the statistical physics and sum-of-squares approaches to algorithm design.more » « less
An official website of the United States government

Full Text Available